This is a personal project of mine that I started a year ago out of curiosity and the necessity of wanting something to practice my data analysis skills in R, the use of Markdown, and eventually also R Shiny.
The focus of Birsmattehof farm in Therwil is the production of organic vegetables. These can be bought directly from their farm, their farm stands at various weekly markets around Basel, or they can ordered for delivery on a weekly basis. 46 times a year we recieve a basket full of freshly harvested organic vegetables, the contence of which is a surprise: the variety and amount is determined by the season. On average, our basket should contain 3.5 - 5 kilos per week.
Over the course of many months, I weighed all vegetables that we recieved in the basket. I then correlated these data the with local meterological data in hopes of finding some interesting trends.
More information to the farm and their products can found on their website https://www.birsmattehof.ch/.
Birsmattehof, Therwil
For the transformation, analysis, and visualisation of the data several packages were used:
tidyverse, version 1.3.1
Citation: Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
dyplr, version 1.0.5
Citation: Hadley Wickham, Romain Francois, Lionel Henry and Kirill Mueller (2021). dplyr: A Grammar of Data Manipulation. R package version 1.0.5. https://CRAN.R-project.org/package=dplyr
stringr, version 1.4.0
Citation: Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. https://CRAN.R-project.org/package=stringr
lubridate, version 1.7.10
Citation: Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL https://www.jstatsoft.org/v40/i03/.
plotly, version 4.10.0
Citation: C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020. URL https://plotly-r.com.
### Installing Packages:
# install.packages("tidyverse")
# install.packages("dplyr")
# install.packages("stringr")
# install.packages("lubridate")
# install.packages("plotly")
### Loading Packages
library(dplyr)
library(stringr)
library(tidyverse)
library(lubridate)
library(plotly)
### Obtaining Citations
# citation(package="dplyr", lib.loc = NULL, auto = NULL)
# citation(package="tidyverse", lib.loc = NULL, auto = NULL)
# citation(package="stringr", lib.loc = NULL, auto = NULL)
# citation(package="lubridate", lib.loc = NULL, auto = NULL)
# citation(package="plotly", lib.loc = NULL, auto = NULL)
Two raw data files were imported for this project.
The first data set, "BMH_Veggiedata_V1.csv"" is a csv file containing all vegetable weights collected spanning from the 10th of November 2020 to the 16th of November 2021. This file contains a column for the vegetable category (i.e. cabbage), a second column for the specific vegetable (i.e. red cabbage), and then columns for the weights of all produce for each date when a basket was collected.
The second file "weather.csv" is a csv file containing meterological data from Basel during the same time period. This data was downloaded from Meteoblue (meteoblue AG, Greifengasse 38, CH-4058, Basel, Switzerland) at https://www.meteoblue.com/en/weather/archive/export/basel_switzerland_2661604 on the 16.11.2021. This file contains a column for the date, a second for every hour of the day, and a column for measurements of temperature, sunshine, precipiation, snowfall, humidity, and cloud cover.
These data sets will be referred to as the "Vegetable Data Set" and the "Weather Data Set", respectively, from here on out.
# Setting working directory and listing files
setwd("C:/Users/gartens1/Desktop/BMH")
list.files()
# Vegetable Data
filepath <- "C:/Users/gartens1/Desktop/BMH/BMH_Veggidata_V3.csv"
rawdata <- read.csv(filepath, header = TRUE, sep = ";")
# Weather Data
filepath2 <- "C:/Users/gartens1/Desktop/BMH/weather.csv"
rawdata2 <- read.csv(filepath2, header = TRUE, sep = ",")
Tidy data dictates that each variable has its own column and each observation has its own row. This format provides a standard way for structuring a dataset which allows for easier analysis. As neither of the two datasets are tidy, the first step of this analysis will deal with wrangling the data into this format. The following sections describe each step of the transformation for both data sets.
With the head() function a sneak peak at the original dataset can be taken to see its stucture. All unneeded columns and rows are removed. The last row contains the summed weight of all vegetables in each basket. The three last columns contain calculations done in the original excel file. These rows and columns are not needed as these calculations will be performed off of the tidy data set using R code. To remove them in a way so that the code will also work should additional rows (i.e. new vegetable types) and columns (i.e. more measurement dates) be added to the original dataset, the variables nrand nc were created where the former is the number of rows minus 1 and the latter is the number of columns minus 3. The original rawdata dataset was then transformed accordingly and stored as data1.
The 3.5 - 5 kilos of vegetables that we recieve weekly throughout the year are split into 46 baskets. The vegetables are delievered weekly except for during the months of January and February, where the basket contains the doubled amount of produce but is only delivered every two weeks. Also, no basket is delievered the week between Christmas and New Years. To account for this, the recorded data from the dates of the 12.01.2021, 26.01.2021, 09.02.2021, 23.02.2021, and the 09.03.2021 is halved and copied into the weeks of the 05.01.2021, 19.01.2021, 02.02.2021, 16.02.2021, and the 02.03.2021, respectively.
# Taking a look at the original dataset
head(rawdata)
# Removing unncessary columns/rows
nr <- nrow(rawdata) - 1
nc <- ncol(rawdata) - 3
data1 <- rawdata[c(1:nr),c(1:nc)]
# Splitting doublely-sized baskets into two weeks
data1$X05.01.2021 <- (data1$X12.01.2021)/2
data1$X12.01.2021 <- (data1$X12.01.2021)/2
data1$X19.01.2021 <- (data1$X26.01.2021)/2
data1$X26.01.2021 <- (data1$X26.01.2021)/2
data1$X02.02.2021 <- (data1$X09.02.2021)/2
data1$X09.02.2021 <- (data1$X09.02.2021)/2
data1$X16.02.2021 <- (data1$X23.02.2021)/2
data1$X23.02.2021 <- (data1$X23.02.2021)/2
data1$X02.03.2021 <- (data1$X09.03.2021)/2
data1$X09.03.2021 <- (data1$X09.03.2021)/2
To tidy the vegetable dataset, I first generated a dataframe cnames with all except the first two column names using colnames(). The dates in the columns contain the letter "X" in the name (e.g. "X02.02.2021"). Using str_remove() this part of the string was removed. The format of the vector was then changed to "Date" using the as.Date() command. Using the command as.data.frame() this repeating list was transformed into a dataframe.
Similarly, two dataframes vegcat and veg containing a repetition of the vegetable categories and vegetables (columns 1 and 2, respectively) "nc-2" times was generated (the first two columns from "nc" were subtracted to account for the columns with the vegetable names and their categories).
The dataframe weight containing all weight measurements was generated by unlisting all columns (except 1 and 2) of data1.
These four dataframes were combined together using cbind() to generate the dataframe veggiedata. Finally, using the command na.omit(), all NA values were removed. The four columns of the tidy dataframe veggiedata were renamed appropriately with the command colnames().
# generating date vector and removing "X"
cnames <- colnames(data1[-c(1,2)])
cnames <- str_remove(cnames, "[X]")
cnames <- as.Date(cnames, format = "%d.%m.%Y")
# generating dataframe of dates repeated "nr"" times
date <- as.data.frame(rep(cnames, each = nr))
# generating dataframes of all vegetables and vegetable categories "nc" minus 2 times.
vegcat <- as.data.frame(rep(data1$Vegetable.Category[1:nr], times = nc-2))
veg <- as.data.frame(rep(data1$Vegetable[1:nr], times = nc-2))
# generating a dataframe of all weights
weight <- as.data.frame(unlist(data1[,-c(1,2)]))
# cbind and remove all NAs
veggiedata <- na.omit(cbind(date, vegcat, veg, weight))
# rename columns
colnames(veggiedata) <- c("Date", "Vegetable_Category", "Vegetable", "Weight")
For neatness, the environment was cleared of all unneeded objects except for the final tidy veggiedata dataframe.
# remove all unneeded objects from environment
rm(data1, date, rawdata, veg, vegcat, weight, cnames, filepath, nc, nr)
With the head() function a sneak peak at the original dataset can be taken to see its stucture. The parameter was set to 15 to be able to properly see the format. The first 9 rows of the dataset will not be needed and were deleted to generate data2. The first step was to transform the date-time column into a properly formatted date. This was done using str_split_fixed() to split the original column into two columns, date and time, respectively, separated by the letter "T". The date column was transformed into date format using ymd() (the time column was not needed for this project). This date column was bound to the data using cbind() to generate the dataframe weather. The original, messy date-time column was removed. The class of the columns containing the weather data were changed to numeric using transform(). Finally, all columns were renamed appropriately using colnames().
# Taking a look at the original dataset
head(rawdata2, 15)
# Remove unncessary rows at the top
data2 <- rawdata2[-c(1:9),]
# fix date/time issue: split string and generate column only with date
date2 <- as.data.frame(str_split_fixed(data2[,1], "T", 2))
date2 <- ymd(date2[,1])
# bind date column to the dataframe and remove all unneeded columns
weather <- cbind(date2, data2)
weather <- weather[,-c(2)]
weather <- transform(weather,
Basel = as.numeric(Basel),
Basel.1 = as.numeric(Basel.1),
Basel.2 = as.numeric(Basel.2),
Basel.3 = as.numeric(Basel.3),
Basel.4 = as.numeric(Basel.4),
Basel.5 = as.numeric(Basel.5))
# rename columns appropriately and change class from character to numeric
colnames(weather) <- c("Date", "Temperature_C", "Sunshine_min", "Precipitation_mm", "Snowfall_cm", "Humidity_per", "Cloudcover_per")
Although the weather dataframe now appears neat, it is not yet in the tidy format. In the next steps individual dataframes for each weather condition were generated with just one value per day (i.e. the average daily temperature, the daily sum of precipitation, etc). For the measurements of temperature, humidity, and cloudcover the daily average values were calculated by grouping data by Date in the weather dataset and calculating the mean value. Similarily, for the measurements of sunshine, snow, and precipitation the daily totals were calculated by grouping data by Date in the weather dataset and calculating the sum of each. In order to generate the final weather2 dataframe an additional column with the name of the measurement (i.e. "Temperature_C", "Sunshine_min") was made and placed in between the date and the measurement value columns. Using nrow() the number of rows (i.e. the number of measurment days) was calculated and stored as days. This variable was then used to generate the correct length for the column containing the "Condition". For each weather condition the three columns were renamed to "Date", "Condition", and "Measurement". The six individual dataframes were then all joined together using rbind() to generate the final compact tidy dataframe weather2.
# Temperature: daily average temperture (average)
temperature <- weather %>%
group_by(Date) %>%
summarise(Temperature_C = mean(Temperature_C)) %>%
ungroup()
days <- nrow(temperature)
temp1 <- as.data.frame(rep("Temperature_DailyAverage", times = days))
temperature <- cbind(temperature$Date, temp1, temperature$Temperature_C)
colnames(temperature) <- c("Date", "Condition", "Measurement")
# Sunshine: daily sunshine minutues (sum)
sunshine <- weather %>%
group_by(Date) %>%
summarise(Sunshine_min = sum(Sunshine_min)) %>%
ungroup()
sun1 <- as.data.frame(rep("Sunshine_DailyAverage", times = days))
sunshine <- cbind(sunshine$Date, sun1, sunshine$Sunshine_min)
colnames(sunshine) <- c("Date", "Condition", "Measurement")
# Precipitation: daily preciptation (sum)
precipitation <- weather %>%
group_by(Date) %>%
summarise(Precipitation_mm = sum(Precipitation_mm)) %>%
ungroup()
precip1 <- as.data.frame(rep("Precipitation_DailySum", times = days))
precipitation <- cbind(precipitation$Date, precip1, precipitation$Precipitation_mm)
colnames(precipitation) <- c("Date", "Condition", "Measurement")
# Snowfall: daily snowfall (sum)
snowfall <- weather %>%
group_by(Date) %>%
summarise(Snowfall_cm = sum(Snowfall_cm)) %>%
ungroup()
snow1 <- as.data.frame(rep("Snowfall_DailySum", times = days))
snowfall <- cbind(snowfall$Date, snow1, snowfall$Snowfall_cm)
colnames(snowfall) <- c("Date", "Condition", "Measurement")
# Humidity: daily average humidity (average)
humidity <- weather %>%
group_by(Date) %>%
summarise(Humidity_per = mean(Humidity_per)) %>%
ungroup()
hum1 <- as.data.frame(rep("Humidity_DailyAverage", times = days))
humidity <- cbind(humidity$Date, hum1, humidity$Humidity_per)
colnames(humidity) <- c("Date", "Condition", "Measurement")
# Cloud Cover: daily average cloud cover (average)
cloudcover <- weather %>%
group_by(Date) %>%
summarise(Cloudcover_per = mean(Cloudcover_per)) %>%
ungroup()
cloud1 <- as.data.frame(rep("Cloudcover_DailyAverage", times = days))
cloudcover <- cbind(cloudcover$Date, cloud1, cloudcover$Cloudcover_per)
colnames(cloudcover) <- c("Date", "Condition", "Measurement")
# make one dataframe
weather2 <- rbind(cloudcover, precipitation, humidity, temperature, snowfall, sunshine)
# average <- weather %>%
# group_by(weather$date3) %>%
# summarise_all(.funs = mean) %>%
# ungroup()
For neatness, the environment was cleared of all unneeded objects except for the final tidy weather2 dataframe.
# remove all unneeded objects from environment
rm(weather, cloud1, cloudcover, hum1, humidity, precip1, precipitation, snow1, snowfall, sun1, sunshine, temp1, temperature, data2, rawdata2, days, filepath2, date2)
In this section, I decided to explore the veggiedata dataset and see if I can find interesting correlations with the weather2 dataset. For this exploratory data analysis I used ggplot() to graphically display the data.
To visualise the distribution of different vegetable categories in each basket throughout the entire measurment period, the plot allveg1 was generated. First, the weightsum dataframe was generated from veggiedata by grouping all weights by Date and using the summarise() function to calculate the total weight of each basket. Using mean() the average weight of each basket was calculated and assigned to averageweight. Using ggplot() the barplot allveg1 was generated. A horizontal line marking the average basket weight was added to the plot using geom_hline() with the value of averageweight as its y-intercept.
# Calculating total basket weight per measurement date
weightsum <- veggiedata %>%
group_by(Date) %>%
summarise(Weight = sum(Weight)) %>%
ungroup()
weightsumtotal <- sum(weightsum$Weight)
# Calculating the average weight of each basket
averageweight <- round(mean(weightsum$Weight), digits = 1)
# Plotting:
allveg1 <- ggplot(data=veggiedata,
aes(x=Date, y=Weight)) +
geom_bar(stat = "identity",
aes(fill = Vegetable_Category), color = "black") +
xlab("Date") +
ylab("Weight (gr)") +
ggtitle("Distribution of Vegetable Types throughout the Year") +
theme(legend.key.size = unit(0.25, 'cm'),
axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
axis.title.x = element_text(vjust = 0)) +
scale_x_date(date_breaks = "1 month",
date_labels = "%m/%y") +
scale_y_continuous(name = "Weight (gr)", breaks = seq(0, 7000, 1000)) +
geom_hline(yintercept = mean(weightsum$Weight),
col = "black",
lwd = 0.5,
linetype = "dotted") +
annotate("text",
x = as.Date("2021-11-23"),
y = c(5200,5000, 4600),
label = c("Mean Basket","Weight (gr):", averageweight),
size = 2.7)
allveg1
To take a closer look at the vegetable distribution in the basket throughout the year, subsets of veggiedata were plotted and correlated with the daily average temperature from the weather2 dataset. By flitering by the term Cabbage in Vegetable_Category, the dataframe cabbage was generated from veggiedata. Similarly, by filtering by the term Temperature_DailyAverage in Condition in the weather2 dataframe, the dataframe temp was generated. Using ggplot() the barplot plotcabbage was created showing the distribution of cabbage types over time. This plot was then added to the line graph showing the daily average temperature in the plot cabbagetemp.
# Cabbage Types and Temperature
cabbage <- veggiedata %>%
filter (Vegetable_Category == "Cabbage")
cabbage$Vegetable <- gsub("Giant Kohlrabi", "Kohlrabi", cabbage$Vegetable)
cabbagecolours <- c("Broccoli" = "turquoise4", "Cauliflower" = "mediumpurple2", "Chinacabbage" = "darkseagreen2", "Red Cabbage" = "lightpink4", "White Cabbage" = "cornsilk2", "Green Cabbage" = "chartreuse2", "Brusselsprouts" = "seagreen4", "Kale" = "darkslategrey", "Sauerkraut" = "thistle3", "Kohlrabi" = "lightpink2")
xint <- c(as.Date(20/12, origin = "2020-12-01"),
as.Date(21/03, origin = "2021-03-01"),
as.Date(21/06, origin = "2021-06-01"),
as.Date(21/09, origin = "2021-09-01"),
as.Date(21/12, origin = "2021-12-01"))
plotcabbage <- ggplot(data=cabbage, aes(x=Date, y = Weight)) +
geom_bar(stat = "identity", aes(fill = Vegetable), color = "black") +
xlab("Date") +
ylab("Weight (gr)") +
ggtitle("The Distribution of Cabbage Types \n throughout the Seasons of the Year") +
scale_y_continuous(breaks = seq(0,4000, by = 250)) +
scale_x_date(date_breaks = "1 month", date_labels = "%m/%y") +
scale_fill_manual(name = "Cabbage Type",
values = cabbagecolours) +
theme(plot.title = element_text(face = "bold", hjust = 0.5),
legend.title = element_blank(),
axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 0.5),
axis.title.x = element_text(size = 10),
axis.text.y = element_text(hjust = 1.2),
axis.title.y = element_text(size = 10, vjust = 1),
# legend.position = c(0.83,0.79),
legend.key.size = unit(0.2, 'cm'),
legend.text = element_text(size = 6.5),
legend.background = element_blank(),
panel.background = element_rect(fill = "grey96")) +
geom_vline(xintercept = xint,
linetype = "dashed") +
geom_text(aes(x = as.Date(02/21, origin = "2021-01-15"), y = 3500, label = "Winter")) +
geom_text(aes(x = as.Date(02/21, origin = "2021-04-15"), y = 3500, label = "Spring")) +
geom_text(aes(x = as.Date(02/21, origin = "2021-07-15"), y = 3500, label = "Summer")) +
geom_text(aes(x = as.Date(02/21, origin = "2021-10-15"), y = 3500, label = "Autumn"))
#plotcabbage
temp <- weather2 %>%
filter (weather2$Condition == "Temperature_DailyAverage")
ylim.veg <- c(0, 4000)
ylim.temp <- c(-5, 30)
b <- diff(ylim.veg)/diff(ylim.temp)
a <- ylim.veg[1] - b*ylim.temp[1]
cabbagetemp <- plotcabbage +
geom_line(data = temp, aes(y = a + (Measurement)*b, colour = ..y..),
linetype = "solid", alpha = 0.6, show.legend = F) +
geom_smooth(method = loess, data = temp, aes(y = a + (Measurement)*b, colour = ..y..),
linetype = "solid", alpha = 0.6, show.legend = F) +
scale_y_continuous("Weight (gr)",
sec.axis = sec_axis(~(.- a)/b,
name = "Average Daily Temperature (C)")) +
scale_colour_gradient(low = "purple", high = "orange")
cabbagetemp
# Root Vegetable Types
roots <- veggiedata %>%
filter (Vegetable_Category == "Root Vegetable")
# To simplify the analysis, some vegetable names are combined
unique(roots$Vegetable)
## [1] "Carrots (Multicolour)" "Carrots (Orange)" "Radishes"
## [4] "Salsify" "Beetroot" "Rettich"
## [7] "Parsnip" "Turnip"
roots$Vegetable <- gsub("Rettich", "Radish", roots$Vegetable)
roots$Vegetable <- gsub("Radishes", "Radish", roots$Vegetable)
roots$Vegetable <- gsub("[()]", "", roots$Vegetable)
roots$Vegetable <- gsub("Carrots Multicolour", "Carrots", roots$Vegetable)
roots$Vegetable <- gsub("Carrots Orange", "Carrots", roots$Vegetable)
rootscolours <- c("Carrots" = "coral1", "Radish" = "maroon2", "Salsify" = "khaki3", "Beetroot" = "maroon4", "Parsnip" = "cornsilk2", "Turnip" = "cornsilk4")
plotroots <- ggplot(data=roots, aes(x=Date, y = Weight)) +
geom_bar(stat = "identity", aes(fill = Vegetable), colour = "black") +
xlab("Date") +
ylab("Weight (gr)") +
ggtitle("The Distribution of Root Vegetables Types \n throughout the Seasons of the Year") +
scale_y_continuous(breaks = seq(0,4000, by = 250)) +
scale_x_date(date_breaks = "1 month", date_labels = "%m/%y") +
scale_fill_manual(name = "Root Vegetable Type",
values = rootscolours) +
theme(plot.title = element_text(face = "bold", hjust = 0.5),
legend.title = element_blank(),
axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 0.5),
axis.title.x = element_text(size = 10),
axis.text.y = element_text(hjust = 1.2),
axis.title.y = element_text(size = 10, vjust = 1),
# legend.position = c(0.85,0.85),
legend.key.size = unit(0.25, 'cm'),
legend.background = element_blank(),
panel.background = element_rect(fill = "grey96")) +
geom_vline(xintercept = xint,
linetype = "dashed") +
geom_text(aes(x = as.Date(02/21, origin = "2021-01-15"), y = 3600, label = "Winter")) +
geom_text(aes(x = as.Date(02/21, origin = "2021-04-15"), y = 3600, label = "Spring")) +
geom_text(aes(x = as.Date(02/21, origin = "2021-07-15"), y = 3600, label = "Summer")) +
geom_text(aes(x = as.Date(02/21, origin = "2021-10-15"), y = 3600, label = "Autumn"))
# plotroots
temp <- weather2 %>%
filter (weather2$Condition == "Temperature_DailyAverage")
ylim.roots <- c(0, 4000)
ylim.temp <- c(-5, 30)
d <- diff(ylim.roots)/diff(ylim.temp)
c <- ylim.roots[1] - d*ylim.temp[1]
rootstemp <- plotroots +
geom_line(data = temp, aes(y = c + (Measurement)*d, colour = ..y..),
linetype = "solid", alpha = 0.6, show.legend = F) +
geom_smooth(method = loess, data = temp, aes(y = c + (Measurement)*d, colour = ..y..),
linetype = "solid", alpha = 0.6, show.legend = F) +
scale_y_continuous("Weight (gr)",
sec.axis = sec_axis(~(.- c)/d,
name = "Average Daily Temperature (C)")) +
scale_colour_gradient(low = "purple", high = "orange")
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
rootstemp
## `geom_smooth()` using formula 'y ~ x'
# Amount of Carrots
carrotscolours <- c("Carrots (Orange)" = "turquoise4", "Carrots (Multicolour)" = "mediumpurple2")
# Setting the order for the bar chart
vegorder <- unique(veggiedata$Vegetable)
vegorder2 <- vegorder[!vegorder %in% c("Carrots (Orange)", "Carrots (Multicolour)")]
vegorder3 <- c(vegorder2, "Carrots (Multicolour)", "Carrots (Orange)")
veggiedata2 <- veggiedata %>%
arrange(Date, factor(Vegetable, levels = vegorder3))
veggiedata2$Vegetable <- factor(veggiedata2$Vegetable, levels = vegorder3)
# Calculating percentages
basketweight <- veggiedata2 %>%
group_by(Date) %>%
summarise(Weight = sum(Weight))
veggieweight <- merge(veggiedata2, basketweight, by = "Date", all = TRUE)
colnames(veggieweight) <- c("Date", "Vegetable_Category", "Vegetable", "Weight", "Basket_Weight")
veggieweight2 <- veggieweight %>%
summarise(Date = Date,
Vegetable_Category = Vegetable_Category,
Vegetable = Vegetable,
Weight = Weight,
Basket_Weight = Basket_Weight,
Percent = (veggieweight$Weight / veggieweight$Basket_Weight)*100) %>% mutate_if(is.numeric, ~round(., 1))
# Plotting
plotcarrots <- ggplot(veggieweight2, aes(x=Date, y = Weight, fill = Vegetable)) +
geom_bar(stat = "identity") +
xlab("Date") +
ylab("Weight (gr)") +
ggtitle("Percentage of Carrots in each \n Vegetable Basket throughout the Seasons of the Year") +
scale_y_continuous(breaks = seq(0,8000, by = 500)) +
scale_x_date(date_breaks = "1 month", date_labels = "%m/%y") +
scale_fill_manual(values = carrotscolours) +
theme(plot.title = element_text(face = "bold", hjust = 0.5),
legend.position = "none",
legend.title = element_blank(),
axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 0.5),
axis.title.x = element_text(size = 10),
axis.text.y = element_text(hjust = 1.2),
axis.title.y = element_text(size = 10, vjust = 1),
panel.background = element_rect(fill = "grey96")) +
geom_vline(xintercept = xint, linetype = "dashed") +
geom_text(aes(x = as.Date(02/21, origin = "2021-01-15"), y = 7500, label = "Winter")) +
geom_text(aes(x = as.Date(02/21, origin = "2021-04-15"), y = 7500, label = "Spring")) +
geom_text(aes(x = as.Date(02/21, origin = "2021-07-15"), y = 7500, label = "Summer")) +
geom_text(aes(x = as.Date(02/21, origin = "2021-10-15"), y = 7500, label = "Autumn"))
# +
# geom_text(aes(label = ifelse(c(Vegetable == "Carrots (Orange)" | Vegetable == "Carrots (Multicolour)"), Percent, "")),
# position = position_stack(vjust = 0.5),
# angle = 90,
# size = 1.5)
plotcarrots
plotlycarrots <- ggplotly(plotcarrots,
hoverinfo = veggieweight2$Percent)
plotlycarrots
# adding weight=0 for multicoloured carrots to all rows/dates for which no multicoloured carrots were present in the basket
# multicarrots <- filter(veggiedata, Vegetable == "Carrots (Multicolour)")
# mx <- nrowbasket_nocarrots - nrow(multicarrots)
# mv <- c(NA, "Root Vegetable", "Carrots (Multicolour)", 0)
# rep.row <- function(mv, n) {
# matrix(rep(mv, each = mx), nrow = mx)
# }
# multi_zero <- as.data.frame(rep.row(mv, mx))
# colnames(multi_zero) <- c("Date", "Vegetable_Category", "Vegetable", "Weight")
# multicarrots2 <- rbind(multicarrots, multi_zero)
# multicarrots2$Weight <- as.numeric(multicarrots2$Weight)